Entity recognition and resolution in semi-structured data

نویسنده

  • Nuno Freire
چکیده

Potentially usable business information exists in unstructured form. This information, although machine readable, resides in unstructured human language texts that are difficult to process by computers. Within this information are references to real world entities, which are the focus of this paper. More specifically, we address the recognition of references to entities and their resolution, in the context of semi-structured data. This kind of data is structured according to a model which defines only generic semantics for its data elements, or includes data elements that contain natural language text. Semi-structured data presents new challenges and opportunities. In this kind of data, grammatical evidence is very often insufficient for entity recognition, since short sentences and simple expressions are predominant. However, the contextual information given by the structure of the data opens new possibilities for innovative techniques. We propose an approach to integrating the support for structured data throughout the complete process. It will be evaluated by studying three entity types in two scenarios with different semi-structured data formats, each one with distinct characteristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Entity Type Recognition for Heterogeneous Semantic Graphs

We describe an approach to reducing the computational cost of identifying coreferent instances in heterogeneous semantic graphs where the underlying ontologies may not be informative or even known. The problem is similar to coreference resolution in unstructured text, where a variety of linguistic clues and contextual information is used to infer entity types and predict coreference. Semantic g...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Semi-supervised structured prediction models

Learning mappings between arbitrary structured input and output variables is a fundamental problem in machine learning. It covers many natural learning tasks and challenges the standard model of learning a mapping from independently drawn instances to a small set of labels. Potential applications include classification with a class taxonomy, named entity recognition, and natural language parsin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TCDL Bulletin

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2012